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Aostract: Two methods of speecn Wav@ analysis using +¢he Hadamard 
trarnstOrn are ciscussec, The first method is a direey apollcation 
Cf the bhadaverit transform for speech Waves. The reason this method 
yielos poor results is discussed, The second mathed fs the 
apOllcarion of tne wagamarc transform to ai log-magnitude frequency 


spé@ctrur, After tne apnliication of the Fourier transform the 
Hecamara transfors ts applied to detect a pitch period or to get a 
smoctneég speccrur, This method shows some positive aspects of the 


“adatara transfor= for tha analysis of a speech wave with regard to 
the reauction cf processina time required for smocthing, but at the 
cost of rreclsion, & formant tracking program for volced speech ls 
imp jemertedg cy usins this methog ang an edje following technique used 
im scene analysts, 


The Views 449 conelusians contafined in this Cocumant are those of the 
a@uthOr and snould mot ne interoreted as necessarily representing the 
official policies, either exoressed or Impliea, of the Advaneed 
Research Projects Asency, 
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1 Introvuctics, 
Necently peaple in various fields have paid much attentien to the 


nmavatTart transforn ano have obtained results from its application in 
such tieldg as filt@p cesitn, voice analyZ3r/synthesizer and 
tTultlplexer equlomert (€1]. The Hadamard (Cor discrete Walsh) transform 
is of@ of the vnethogonal transtorms Using discrete walsh functions 
and NaS a fast alyceitym similar to the Fouriear trensforn £29,033, 


Thep@ 4pe many reasons why tne dadamard transform is attractiva. Two 
TajJor reasons are a5 follows, Fiest, tne Fast dacamard Transform 
algoritrm=™ sFHT= uses only add / subtract operation. Multiolisation is 
Mot Meccessary for tne FHT, This mak@®s the calculation of the FHT 
axtpemety cinpie ani faster than the Fast Fourier Transform = FFT. In 
the FOurier transfc; > case one needs switiplication for the 
sinaecosine scefficients, soretimes even With irrational numbers. The 
FHT offers quite a sianl® 2n1 an appropriate algoritnm when using a 
niGital computer, 


Secondly, the cviserete w«aalsn functions Jive us a Tensepal hasis for 
signal analysis, namely tr corceat of sequency ratner than that of 
treoqueney, Teg seguency of aiscerete walsh funstiens Is defined by 
one Malf ot the average munder Of Zero crossings per s@conc, This 
comce@pt anavles us to rerlace the concept of freguency of the 
simercosinée furcticns, 


secause of this feature af tna Hagaraed transform one may wel! think 
of the passibility shat all oroplems wnich have been solvec using the 
Folrier transform Scie ne racinteroreted by the 4adamard transform, 
Furthernure, ore nigtt hope for some interesting new siscoverles 
eimce the Hacantard transforn might reveal sone new asceet of the 
esrcbhlem conmcernen, 


Feom this optiristic gtanacodiat, the author nas attenn te: an analysis 
of the sreect wave using tre Hadamara transform, Simifar attemsts 
Nave Ceen ace in the past [£4], and they have sugzestea seme 


possidilites about se annlication of the havamard transform to the 
speecn wave cy Ssnowind sone correspondence between tha frequency 


Spect Ur anc the sequency spectrun, This penort Willi show twe 
methods of s:2ec.% wave analysis using the Hadamard transforinn the 
cCirect and the indirect methods, Thes6é two matnods show both the 


edvantasges ant dtisaavantages of tna Hadamard transform for speech 
wave aPaiysls, 


weettero’ CeLAeeeees td”, abameh Ake pe h en RY w Pod Bede e Sh haute 
gué to the strong snifr sensltivity of the Hadanars sequency 
spectrur, Sore shift invarlarnt terms of the seauency power srectrum 
are KnOwn hut thay ara cornlicated tn calculate or too simole to 
provide nAsegn infornation, A few experimental results are shown in 


thls s@ction to denronstrata these facts, 


( 1) 


sectien 3 jl. @xstain tne Inadirecs satnod ated tig “aasastrum" 
tecnanicye, TE Magstrun tecinique is a similar tecnniace to tne so 
called cepstrs- teshniaue [3] except that tne F4r ts acolica to the 
l99eMaGnitude frenuansy spectruy, This tacnnique is indipe Simin eae 
serge that at first tne FFT (not FHT) is aoplied to a shert span of a 
SPE@CN atve anid wren tie -)-7 15 ysed to detact the sitsh Berl be or to 
get a Smceotned spectrun, Tis techniaue stows sona positive assact of 
tne HaGamars: transfsem for the aralysis of a so9ech wave with regard 
to sToutning of aA go8ctrun. some experimental results will 
Ceronstrate tris, 


A Ormant Bee aX prea crocrai nag Seen impnlenented usin2 the technicue 
Cf aM face foliy- ia gcoena analysis combined with she hacstrum 
tecnniave, se Ree Suclt an approacn always contains a pitfall, 
Pamaly t™2 peoclem sf «pon: way entrance. This will ce discussed 
in gestion 3.3, 

Yimally, in segzion S 2 Ta8zazive evaiuation will! he nadye of ¢hea 
sadamard transfor for anaivyzing sneacn waves, 


( 2?) 


2 Jirsct application of the Hacasard transforn 

to smerc. wave analy3is. 
In thls sastion the Hadatard transform witt oe directly arolied to a 
Speecn aave to 3e% the saquency power snectrum, The exlstenca of 
Sofe Corre3ponserce b2twaer fraquency spectrum and sequaéncy epectrun 
Aas bee@r rsrort227 on [4]. As a given vocalic sound can o€ 


chapacterizad by the tscation of its fiest three formant feeruencies, 
it iS Worth investisating te existence of formant “s@quensies" in 
the Hadararid secuercy spectrum Insteed of formant frequencies, A few 


excerjTents will reranstrate poor results ang thea reason will oe 
Jiscussed, 
2,1 Definition cf sequence and sequercy. 


Tre Gefini¢tian of secuenc# vse introduced fy a. F, Harfueh, [32 and It 
si¥es 4 mee easts Fron -mich to inveagtignate the character gtio of 


aignals, , gecusnce netatr oof a ae}en function ls definec by tne 
ayto@r of slan ctanges per unit time, Leen = 2” sansacytive real 
nyroerps aiJd, ° §€ J € le 38 Pepresented oy @ Le N vateia Call). 
The Hadenard transform of Calj)] is 


CACK)) = CL/N) Cag) Hen) (1) 


where tne Vox '\ tadamard atrix Hla) is defined recursively in the 
eguatior (2), 


aya . ae 
R(net) = (2) 
4(¢9) en( 1) 


mC 3) = ell 


bach celunn of <«(n) egaresants one of discrete walsh functions C7). 

The seorency js Jefingt hy phe averaye nuacer of (zeccg crossir*sg_ner 

unit tires ou 3! oy 3° Let 5 (R) be tne nun er of sign echanjze3 Tzero 

ralatec to frecuance or setuancy if is d@sirable to calculate 
gefined ay @q,(2) 

n(n) = fockr +2973 (3) 
ahepa fx) ranresents te lardest integer which does not exseea Xx (see 
u(y) of tne Fic, 2.4). Tt Is known —that e(K) takas cn all valués 
between Zaro ant vet and a(n) takes all valués between Zero and N/2, 


( 3) 


Qs 


ee- sequence of each column: 
w2e sequency of each column! 


Ne eeeepeapeenceenenaneennereneeeeenerraunnennnnnneeneeneneoenel 
anne es 
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Examples of the Hadamard matrix, 


Flag, 2.4 


4) 


Let US imtraquce twg notaticas, Alcsp(K)) and Als sn (k)) for Ak), 


A(ceglk)) if pk) is even 
ACK; 


Aiseglk)) if p(k) is oda, 


In analogy of frequency pover spectrum, seavencyY noWer sceettrum Is5 
dafineo as follows, 
2 
A(c,.)) 
tw 4 Gi.) 9<a<N/2 (4) 
a 
fol sy kha) 
Tr@ Parsrval’s earns is praserved on the coefficienss ACk) ang 
ack), 
(w/2)-\ 
: , 2 2 2 mn 
(1s. ae re) > Mba “20 ktemg) + dteegod’ + Ste euye? « ofa 
9-1 
2,2 Etr97G sniftesunsicivity of thea Hadamard SQQUANCY soactrun. 
1+ erastl> of inve at a ie f faant u n, 3h? 
SgabSnej verge te 3 3? a ate ef fave. The : st‘ ovens gn fea ’ to 
Cnarje feny sOrSecutive soszctra Into a ylguall (oer tnat A * Ene 
3oPcIrem of ssquernies, 3 ote time soar (12,8 as.) of 2a diritized 


SneECN wave (Sadun ie pate = 22992 H7.) is dAlrectily transfor ed inty 
S@Guency, spectrin, Then tre log magnitude of this spectrum is taken, 
Mary sPort time seyyency snéctra ar@ calculated in this way, are 
accumulated, and eventually output to a video screen, 


ExneriMental resul)s are@ srown in fia, 2.2. Tha upper part snows a 
s97ech weve to be analyzed, the riddle part a sonouram of frayquency 
Stale af TIS Speecn wave and tne lower Dart a Sonoyran af secuvency 

ectra, It fis easy +5 see that the sonogram of s@auency srectra (the 
arg - one) is rougher thay that of the frequencies (tne migale one), 
The formant seauency structure is not clear and I+ appears to be very 
aifficult to ouilg a gsy8ech wave analysis syste besad on the 
axtpaction of forrant cornmenants using the dacamard sequency 
spectrur, 


The reason any ta sonoyra of sAquancy sgectra o@comes so rough and 
irreSular Is made clear ony the following experiment, The Hadamard 


FACLENCY epectrus ig calculated for a fixed time span (42.8 msec, 
long) of a c¢prech wava, The clme span is shifted elght by 1228 
PicrOs€conds for a:¢eh successive calculation of the se8quency 
en@ctrur, In other words, calculation of a seauency spectrum is made 
Gach ley micrcseccrd time=shift, A Frequency spectrum of «the Fourier 
tearsfOrm {ts calculated in the sane way to ake comparision with 
SQGLENCy sractrum, The p@cults are shown in Fig, 2.3, 
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Fi9, 2,3 Strong shiftesensitivity of the 
Hadamard sequency spectrum, Each 
frame fs calculated each 1080 
microsecond tineeshlift, 


Feom Fig, 2,3 we can easily understand that although the time-shi ft 
Is limited to this small value, the shape of consecutive sequency 


Spectra changes rapidly, The location of a peak which appears to 
reprssent a formant comconent changes drastically In the next 
sequercy spectrum, One cannot expect these rapid changes from 


Observation of the original speech wave sinoe the speech weve does 
not aPpreciably change its shape during 120 microseconds, In 
contrast, In the Fourier case, a frequency spectrum does not change 
Its shape s0 much during 188 #£microseconds, This strong 
shiftesensitivity of the Hadamard sequency spectrum causes the 
IrreQularity orf rough pattern of a sequency sonvgram and makes 
Impossible the apolication of the piteh=synehronous method, 


The strong timesshift sensitivity of a sequency spectrum aiso can be 
expjained theoretically, Plohlier (6) shows the Hadamard sequency 
Spectrum Is invariant under the dyadle tlme-shife: 


Co(j)J Is opotained by the cyadic timeeshift ¢ 
Co( jd] = Cacy @ ¢)) 


wherp@ Jj @ t stands for component=wise modulo two addition (no carry) 
for the binary representation of J and ¢,. Pilchier’s result Is 
Wel¢ten as follows, 

B- ¢e,g) + AP sg) = a® (erg) i Filan (6) 
Unfortunately the Yadarard sequency spectrum Is not invarlant under 


Circular tineeshif, of the Input Catjo), If Cati)) Is siifted by ¢ 
Olrevl@ely forming Ce(jJ)] we obtain: 


Ce( jd = Cactcy + ¢))) 


Where (Cj + ¢)) jis tne principal value of j + t modulo N, In 
General 


Vong) * ra PP r a= Cera) * tai) (7) 


The @*par iment ghigan In FIg, 2,3. Iw moe the casa of clrewlar 
timerarift but one con eaglly Understand ghat tne relation of eg ¢7) 
Saug@s the strong shift sensitvity in the HaGamard sequency apsctrum, 
Note that in contrast to the Hadamard sequency spectrum « frequency 
Sp@ctrur of the digerete Fourl@r transform Is Invarlane under 
Gleovl@r clae-snift since aDgolute value of a ghife coerater Ie one, 


2,3 Diff levieles im calculating shift invariants 

for the Hadamard transform, 
Sore attemnts hava been mane to define circular tlme-shift invariants 
for the Hadamard transform, Ohngsor3d has defined a complete set of 
Circular ¢imeeshift invarlants of the Hadamard transform and also has 


( 8) 


shown Intermediate forms which are invariant to botn clreular 
tite-snift and dyadic time-snift, For more detalled derivation of a 
complete set of circular timeesnift Invariants and its intermediate 
forms sea (7), 


As a flest step, consider Intermediate forms, a set (P(k)) which is a 
su™ Of groups of components in [A(k)] squared such that 


p*cgy) = a2 ca) 
P2 (4) = a2(q4) 
P2(9) = a2) + w (3) 


(8) 


In genera eee0e08eeaeeé s 
Pity) 2 etky 
where 27 '<¢ 


Examples of salculations of a set (P) for various input waves are 
Shown jin Fig, 2,4, In the flgure the short time span of the speech 
wave fOr the Hadam™aro transform is fixed to 12.8 "sec, Each 
Cofponert of a set (PP) Is shown as a function of time In the Fig, 
2,4, Overlap of the time span for tne next Hadamard transform is 6.4 
MSOC, The case of a slnysoldal wave indicates the filterina 
characteristic of a set {P) hecause the position of each neak Moyes 
to the left as k Incpeases in P( kK), In other words, the smaller the 
value of k In eq (8), the more Iikely it Is that the component P(k) 
will Pass the higher frequency component since frequency Increases 
with tlre passing 'rp the orinirzal Input wave. However, as the obdand 
of each filter is oetermined by the number N, which Is the dimension 
of af array CA(kK)), wa tose flextbill¢y, Although the saiculation of 
a s@t (P) from N cenponents of CA(k)) Is straightforward, we can get 
only 1 + nl logoN) componants of P, For instance, if N # 256 one 
cam g@t only 9¥ contponents of P and one of them Is d,c, component, 
Thlg m@ans a grert deal of infcrration reduction Is made and it is 
Ggubtful if a set (©) Centtirs @nguan Information to perform speech 
wave aralysis, 


Ohns6e9 has sjetingd anoenar csortplete set of the Haaamard transform 
whieh mas exactiv (1/2) + 1 Invariants for a circular *imeeshift, 
(The discrete Fourier transfor™ -SFT- gives a ("72) + i polnt 
spectrum,)  ovSvar jt is ot a stralantforward way tO salculate the 
Invaria@rts since it iacludes neny matrix multip'leations, According 
to (7) \If we lee (J) 92 a quadratic invariant set of the Hadamard 
transfOrr, then 


ict 


In tP9 casa when \ = 3 


gta) = ak ens 
> 49 
o> Gl = AS GP) 
S202) = £2 09) + 02 (3) (9) 
J203) = Af C4) + 4206) = AC4)A(7) + ACS)A(H) 


J*¢4) = a2 (05) + A207) + 8649407) © ACS)ACS) 


( Y) 
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FIG, 2.4 Calculation of eq (8) for varlous Input waves, 
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Inusoldal wave 


A|trouch thers is no explanation about iow these terms (J) are 

related to fregquance or segul 

Ohrsora’s C7] Invariants. As Ohnsors suggests that tre prominant 
enerady tines of the discrete Fourier spectrum tend to se exaggerated 
in t e@ quadratic spectrum (J), 


N, An™ec et al (8) found an efficient algorithm to calculate *nese 
terms, However multinitcation by an Irratinnal number Is included n 
the algorithm and it is more complicated than that ef Hadamard 


transfopm, 


(11) 


3 radgstrurt tae rniqus, 


In t9iS section tre "nNagstrun" technique is intredused, ne 
Naostrum cvecnniaque jis a slmitar technique to tha cepstru- tecnnicue 
@xceot that i412 inverse fast Hadamard transform «J FHT- 18 applied to 
tie l|Ogemuasnituia frequency spectrum and tne output Is called 
"nagstrum.," Thlts technique is tndlrect in the sense that at first 
the FFT (not FHT) is applied to a short time span of a speech wave to 
ootaln the scecesrut and tnen the FHT is used to extract pitch perlod 
Or to get smcothag snectrun. The strong timeeshift sensitivity of 
tn? Hadamard transform is ramoved by tre first appiication of Fourier 
transfOrn ta s.ueecn waves. 


Tnis .techn]cue iliustractas positive aspect o he dadamara 

transforn + ae tre aRalysis of a Speech wave, Biel iailts with regard 

t2 tNe smootring of 2 spectrum, A formant traca«|ing program has deen 
i 


imelemented using tnig teennique, 
yh BL Cutline, 


Ty SNOw Y9TN tre advantages and disadvantages of the hapstrum 
tecnniaque we will cecict tn@ outline of both the cepstrum and the 
MNadstrUm vacnniiues, Altyough there Is more than one definition of 
tre cedstrun tecryigue we jive a tyalical application in the Upper 
Oart Of Fis 3.4, The nanstrun techniqua Is shown In the lower Part 
of Figs Sith. 


ber FIG. 3.456 one can aasily understand tre alfferencse betwean ooth 
tacnniaues, T's frequency spestrun of a short time span of a speech 
wave fliterad oy a damming window igs ootained by the discrete Fourloer 
transfOrm -0FT, Than the log-nagnitude of this spectrum jis taken, 
Aft3r tne arocessing, in tne case of tne cepstrum technique the 
invars€ discrete Fourier transforn -IDFT- and OFT are apotled to get 
Cites Oeriod ana sngotned soectrum, In the other Nand. in the case of 
tne Nadstryy technisue the IFT and OFT are replacea by the IFHT and 
FHT, respectively, A naostrunm, which Is ordered in sequence (not 
sequency), !s ootainpd by the IFHT of a logemaagnitude spectrum, From 
Tr? ~r@placamants sme ats «ng advantage of tne fast caleulation of 
the sadanard trang*se%, Jue to the elimination of linear filtering 
cotsuting sost Is @v4n further paduces by tna method, 


st uF note t5at In tad capstrum case afser the aeplicatian of the 
s/0F 90 oleseete “o5°) er trs%s?art 42 "004 [owepaga fiitering of <¢*he 
fot="atritusa of <-shq@ t3tente Fourler transform, Sy means of 
loe—fats flitarls; a goc985e0 soectrunm Is Ootaines due to the 
@jieinatian af othe) of lee cgtructure oof) 6the)|6spectrue, This Ie 
B259"5l(s 703 3% Toltigiving th® cepstrum py & loweores filege 
fynscien, 


In cOnteast to the cepstrj1_ tachniqua, tha hapstrun technigue uses an 
ijeal tylreer as a towwnass Fj iter in the seauency domaln of the 
Naodstrur, Tharefore 978 needs no multiplication to cut Nigner 


(12) 
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Fig, 3,1 The cutline of the c@pstrum technique 
anc ¢he harstrum technique, 
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sequence conponents, The Nighar sequence components are simply made 
Zero. This also reduces conputing cost (the symbols +/- and x im the 
figure indicate tne necessity cf add/subtract operations Or 
multiplications). 


Feom the author’s @xperienc® the calculation of the FHT is ten times 
as fast as tnat of the FFT, This suggests that cy using the napstrum 
technique we can make the calculation of spectrum snoothing at most 
thre@® times as fas¢ as trat of spectrum soothing using the cepstrum 
techniaue, 


However, we shouls pe awara that smoothing Cy tre ceostrum givas us a 
better aprroximacion or an origina! log=emagnituce spectrun in the 
Sense Of ljeast-scuar? error criterion and that smoothing by the 
nhapstrur degrades resolution of c@ak position ef j|og- magnitude 
Spectrum, Tha tneoretica! reason for tnis will be discussed In 
section 3,2. 


Ge Piten detection, 


To extract a pitch oeriod we have to take a sufficient time-span of a 


speech wave to calculate a jogemagnitude spectrum, namely jong enough 
to Include at least two clottal pulses. 


In Our experimerts tne duration is taken to be 25.6 maee 
corr@sDonding to 512 samples of a digitized speech wave since the 
saTpling rate of a speech wave is 2arb¢ HZ, 


FIG, 3e2°a shows a series of cepstrum clots. A series of cepstrum are 
calculated for each consecutive segment of speech wave cne@ half of 
whieh Overlans the previous seyment, In the cese of the cepstrum, to 
get a hianer resclution S12 Zeros are adaec to the next 512 samples 
of a digitized speach wave, This means the JOFT e#nd OFT are 
calculated on 1024 points. 


Fig, 3,2*0 shows a series of hapstrum ciots. The Fracstrum is 
calctlated uncer tha s4ne cangition 4s the cepstrum of Fila, 9% To 
calcwlate a hgostrur a do noz add Zero to the next 512 samples of a 
speecn wave, sinca one cannot vet higher resotuston cf the hapstrum 
by adding zeros ‘see 3,3 in this section). If 512 zeros are added to 
the next 512 samples of 4 digitized speech wave one wil| get a 
haostrur such thas tho conponent of the sequence (not sequency) 2i 
and 21 ¢ 1 Decomes tre sine value, where | is a positive integer. 
In other words a hapstru” -f a speech wave segment with added Zeros 
is easily calcula eg fron one without added Zeros. This spectal 
feature of she Hadamara traisforn is utiilzed by the smoothing of the 
lo9G-raGnitude spe8ctrun in te nert section, The proof is shown in 
the APPENDIX in tore a generalized forn, 


Comparing Fic. 3.27a with Fig. 3,2-b, We observe that in the cepstrum 
a shar® pgaK appears at. anoroximately 4,5 msec but in the case of the 
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5 19 msec. 


iF 5 12 nsec. 


Pattern 


1 Late tenner | 


1 Capertee 


—=_ 


Cepstrur series Hapstrum series 


Fig, 3.2 An example of cepstrum series and 
the hanstrum ge@p les, 
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Raostrur the peak Is not so sharp. 


Ag pitch perlod Is determined by the location of this sharon peak It 
wh be more difffoult to extract pitecn Ber lou from the hapstrum, The 
cepstrUm is superior to the hapstrum in so far as pitch detection is 
concerned, 


3,3 Smoothing of a spectrum, 


The r@sonant freauencites of a voca{ tract are called formant 
frequencies, and the first three characterize a given vocalic sound, 
Therefore, for the analys!s of a speech wave it Is very Important to 
extract these three frequencies. The procedure js caliied a formant 
tpecker-, 


There® exist two mathods to extract formant frequencies, ne s the 
}imear prediction method which extracts these frequencies directly 
fror a given speech wave, In other words the formant extraction is 
performed on the time domain, Atal et al €9) reported good results 
from the metnod, The other method Is based on peak detection In the 
fr@cuency domaln of a speech wave (12), 


Fig, 3 an exa @ 0 og-megnitude spectrum of a short=time 
ae saieh wave” 13 Shown e nhs. § 4 Le gests that a TESSHE St Bad spectrum 
ls comPosed of Its spectral envelope and the spectral fine=structure, 


RoughiY speaking, the spectral fineestructure has equidistant peaks 
at tne pitch (fundamental) frequency and har™onics, 


As fOrTant fraquencies are represented ty several prominent peaks in 
a sPectral envelooe, Smoothing or elimination of fine=structure is 
Important. The ceDstrum technique Is one of the prominent methods 
for it [£5),£17] but its computational speed is rather stow since it . 
Incjud@s three FFT calculations, With respect to this polnt the 
NapstrUm ts faster at the cost of degradation of accuracy cf peak 
positions in smoothed !og-magnitude spectrum (see eq (11) and (12)), 


The Namstrum technique is_based on FF he AN In the, s@quence domain 
ag shown In Fig, 3.1, The output of the TFHT, which is In sequence 


order, 1S a MapstruUr, After the detection of a pitch period from the 


hapstrur, all hapstrsum components with more than a fixed sequence 
numb@r are set to Zero. This Is accomp|ished by an Ideal filter on 
the s@quence domain, WIth the higher sequence components cut from 
the Mapstrum the FHT is used to recover an orlgina! log-magnitude 
spectrur, 


The determination of the cut-off sequence number is as fojlows, Let 
an hapstrum be represented by an array Ch(J)] of dimension \ ( = 2 ) 
and the location of a peak cauaee by pitch frequency ce an Index 
number k of ChCj)]. If Nsca2ei-2? ») ¢ k «€ N/(2' ) then cutoff 
sequence nynoer r Is 


e ustzi-') , (10) 
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FIGS, 3.3 Sneetral envelope and spectral fine 
Structura of logemagnitude spectrum of 
% Spe@ecn wave, 
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Consloer tra meanings of filtering by an ideal filter in the sequence 
domain, Let an array (CalJ))] of dimension N (= 27) baa digitized 
sigraj in wnich all comp suments such that N/2 $< j < N are sat to Zero, 
By the application of the FHT Including sequenoa ordering the array 
Ca(j)j is transforred Into an array CBik))] such that each adJacent 
Component cecomes the same, namely: 


b(2) = H 
R(2) = B 
estpeveveev ve eee (14) 


Mme 


BK(Ne2) = 2K et) 


2 
Furthermore, when ai | componanys such that N/(2. ) $ J < N are s to 
Zero the array [ail)] Is transformed imto such an array (B8(k)) by the 


apollication of the SH™ including sequence ordering 


= (2) 
: = 5(6) 
“ee ee epee tes (12) 
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ww 
= 
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R(Mes) & SONeZ) = BON?) = BI(N|W1) 


Eq (11) ana (12) are generalized in (A) and (B) of APPENDIX, 


ba ¥) 


Both Squations sugsest that if (B(k)] is plotted as a function of 
arfay 'naex k the curve bacoOmes flat as the Value of each adJacent 
component is thea same, because of shis flattening effect It aedrades 


resolution of pa@ak positions in the array [B(k)). To cemonstrate 
thig an example Ig shown In Fig. 3.4, A seament of a speech wave Is 
shown and is analyZed by tne hapstrunm technique, The two lower 


Curves renr@sent +e |ogd=e"agnitude spectrum and the smoothed result 
By the napstrum techni jue. 


The sMootnic lojyrnagnitade af rea in Figs 3.4 demonstrates the 
srocthirg arfect statac before dy eg (11). Many sharp maxima and 


minima caused by gloctal culs@s (or pitch frequency) In the orlginal 
lo9efadnituse spectrum are dininished in the smootheg§ spectrum. 
From thea author’s experience the number of Deaks is decreased to one 
half that of shea oriaqinal. 


The iMportans nauestion is whether or not the prem igrent peaks caused 
by r@sOrance of a Vocal tract are preserved by the smoothing, From 
the exaronia snown in Fig, 3.4 wa can see that the hapstrum smoothing 
tecnniaque cives good smoothing with regard to preserving the first 


three formants, 


Fi9, 3:5 gives angtner exarpie which suggests the formant components 
are prasarved after the smonthing sy the napstrum techsigue. The 
Under Is a Sonnygran of spectra Without smoothing and the lower is a 


sonmograr of smoothed rasults using the napstrum technique, 
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Figs 
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Speech wave, the Jogemagnitude spectrum and 
the smoothed spectrum by the hapstrum technique, 
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Fla, 3.5 Sonograms of jogemagnitude spectra 
and their smoothed specra, The upper is 
a sonogran of log=-magnitude spectra and 
the lower Is that of smoothed spectra. 
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$.4 A formant treackinS Jreogram as an apni ication of 
the hapstrun tesnrnique, 


A formant tragkin 9raa has been Imnjemsntad using he hapstrym 
tesnntaue cad aa bdde fol lawar fechalaus as used in eruie analysts 
CiijJ. In principte, the formant tracking program presented here 


accepts any kind 5f smoothing technique such as cepstrum or {nverse 
filtering C12), 


Edde followers ware first Implemented to recognize objects ina 
Sc3n®, An edge follower detects a position where sharp change of 
contrast occurs and follows It successively, A sonogram Is just such 
a scene with formant trajectories represented as dark stripes, By 
detacting dark stripes we find =he ltocations of poaks In a spectrum 
sinc® a sonogram 1S peorasented as a sequence of spectra, 


Theres are many difficulties In Implementing a formant tracking 
Program bases on an edge foilower. One prootem ts that ai formant 
trajectory Is not a straight line, out Is curved, Some of the edge 
fol|Ow@rs nave treated objects composed only of stralgnt ilnes, such 
as cCudes, This limitation can be of use te an edge follower, For 
Instance we can prevent the folfowing of the wrong path by using the 
criterion of curvatures, w@ also can forecast the axistence of edae, 
whieh Ig mard to detect recause of molsa, by using straight line 
InterpOlation methods, AS the production of a speech wave is a 
dynamic and stochastic orocess, the human sp€ecn wave contalns much 
nolse, 


A second prgdiem is that a Is very difficult to dey ide -a formant 
feequency from local Information, A Wide range of overlan exis 
between the region of the first formant frequency and that of the 
second, also between the second formant frequency and the third. In 
the caSe of a male voice, the flest formant frequency ranges fron 227 
Nz tO 922 NZ,5 the second from 554 hZ,. to 2748 hz, and the tried 
fron 1109 hz, to 3027 hz, 


A third probjem is that If we ses a sonogram In a microscopic way 
there @xist too many peaks to discriminate the formant components, It 


Is deslrabla to have a technique to eliminate trivial peaks while 
pees@rVing prominent peaks caus9d ody the first three formants, 


Ceostrum is such a technique. Radiner and Schafer [13] have 
Imo;emented a foOpmant tracking Program based on the cepstrum 
technique. AS tNair method makes frameeby=frame decisions for the 


first tnree formant frequencies, they use only focal infornation in a 
sonograr, Jt ts desirable to utilize more global information, 


Mark@{Liz] has developed a Vary 300g technique for getting a smoothed 
Spectrum based on tne jdaa of {linear prediction method, He calls it 
Invers® filter{ng and nas develonea a formant tracking program (13) 
whieh uses Information from tha pravious frame wnen It Is difficult 
to cetermine the first three formant frequencies, 
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eroiranm explained “tres follows tiarkal’s anyroach 
ing mechanism to recover {ff ai Wrong path its 
follow@g, Jf de ions are made frane ov frene trere is no wrond way 
8ntrance protlan, ver if we "ake a wroig sécision in a frame, the 
affect coes not gronasate to the next, However, if we wse tne 
information from just the previous frane the effect of a WrONnG 
Qeacislor wili propagate, Te cone with this situation it is necessary 
to nave @ rasovary sacnnique which utilizes more gioval information, 


Tae formsat trackirs 
Dut wlth a oacktrac 
Cis; 


o. 4.1 Lovical struczure of a formant trackina srogram, 


wre modules named PEAK 


Jur fOrprant trasKxing four 
1 COVERY. Ganeral flow of 


pag fo%4 
JETECTOR, CANDIDATE se PET. 
the PrOsgran ts Sshovn jn fis 


am ts campos37 of 
r 


4, PEAK DETECTOR, 


MEAs WETECTI+ actecves 7 Tigleizad soeecn wave of «a Vocal ¢ sound, 
Calcvlates & GhoOrned aspectrus oy using the hanst¢reum Cechnioue.s ang 
cdetermipes oeatg, [+ should Be noted that the nagatrum Seonnique is 
us@c te cacrdass i997 opboasging thas raaulreg tor smoothies, [ft can 
®atily Ge ranlaces sy asottar teconiaue suen a5 isnverge® Fileerinn oe 
fhe CMosteu™ teer-l<ue, 


@, CANUIZATE SELECTRA, 


For #ach resion of the firet three formant frequencies, CANDIDATE 
SELECTOR selects at nost toree candidates fron many paaks detected by 
PEAK DETLCTOR and opiers trem vy amplitude of paaks, The third 
Cansidate whyee acnaiitude is 7,5 db less than that of the second 
Cancidate is removes oy 298 ordering yr acess, These candidates 
select@c ar9 accunulatac and ars usd by TRACKER and RECOVERY, This 
pclUtin@ renuces tre search snaca, 


C, TRAUKLR:, 

TRACKER tanes che raeul-s fron CANDIDATE SELECTOR ang “ak@s a 
tertative secision far tne first tnree formant frequencits, At first 
TOomCKER looks for a peasonitle xiace to track, There exists a reglon 
within which an overian of two formant components never occurs, In 
tne Case cf a mate vcices, only tne Fiest formant acists gsetween 220 
MY e2 ame sat Re, ant aely tne sacgond and the thirc formant exist 
oetneen vic 412, ana 12074 7%, and between 272328 4z, MAG) Rarowe') Binlraa 
if the first ‘tarsidate for a formant fraquency is withia sre first 
recicn it is reasonacie to assure that tris is tne peak caused by the 
formant, after makince an initial selaction TRAGKE? ceting trackine 
forwared cor DéeCkwar., 


TRECARER Wses_ tho cretter!ag to determine formant freayencies of +ne 
Mext frare, Sasicki ly. VAVKLR uses a criterion af winirur shift “oF 
Me@k Cositicn frre ane Frans to tne naxt, ThiS Nearest neigndour 
oduced from oO 
Rest available copy. QM (.2) 
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Fig, 346 General flow diagram of a formant tracking 
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Criterion ts used as tone 23 merging of two formant frequercies dees 
not occur, As sj0n as mercing of two formant frequencies occurs, 
TRACKER makes usa cf ths otner criterion to fook for a soint of 
SeParation wfter merging, After a tentative selection of the next 
formant components using tna first criterion, TRACKER looks for a 
Deuk Whose position is aithin a reasonacsie range from the peak one 
frame Defore, If TRACKES can find such a peak it will select the 
peak As the point of senaration of the two traJectoriés, ot*erwise 
the two trajectories rpenmain nerged, A wrong decision by TRACKER is 
Corrected vy the RECOVERY routine. 


3, RECUVERY, 


RECCVERY works wher some incensistency ts récognized by ea formant 
tracking wprogran, An inconsistency is a discontinuity or a share 
change in following a forrant trajectory. 


Th@r@ ar® two major reasons why TRACKER follows a path that has a 
ghAa,b Chania f,on one fara <9 gre next. The flict -C@gon ig thay a 
Promln@rs peak caused ov ai formant conronent is often lost ina 
spectrum neceuse of the stochastic "ovement of Qgiottal rulses, This 
resultS in 2 discontinuity in a formant trajectory if the trajectry 
is s€tectana in a aicroscopic way, This can be resolved ty using the 
Neighborhood infarpnavion, A FORECASTER works in this case. 


The s@cond renson is that a wrong decision has been made in the pest 
hy TRACKE. and « wrong path has teen followeu as a formant 
trajectory, For example . formant tracking program has mistaken the 
first tormant for the second and the trajectory suddenly enters into 
«ne redion where tne second formant does not exist; namely tne region 
netween 72.; Nz, and 565 h2, Tne other typical exancie jis the 
foljowlr: of a Wrong path which is not a formant trajectory and 
evertualty dissooears, TNes® are corrected dy using a backtracking 
mechanign in tna aALCOVEPY routine. In the previous exanole, after 
thy RECUVEY) roucine nas recognized an @rror, a trajectory followed 
ag the sacond formant is replaced ny tne first formant *«rajectory. 
Than arothtr peak is selected for the second “ormant frequency by 
RECOVERY, The routing extends the new second formant trajectory 
DaskWarc vy using TZACKER, and FORECASTE?. This is an example of how 
the Dack tracking "echanisnm works. We can see that that 2 recovery 
orpceess by oacKk tracking has to nave a recursive structure, but in 
Our CaSe ta depth of recovery is limited to one, 
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An exatcole of forrant tracking obtained from the program is shown in 
- ig, 7. A §n3xen s§anrtence is "wa wera away.” sore than 90% of 
running time is uavoted to PEAK DETECTOR and CANDIDATE SELECTOR, 
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Fi9, 3,7 An example of the first three formant 
trajectorles for a sentence of "We were away", 


4 “onclusion, 


fh fh® sanor ae Toyo AfSussod both the acVOntag@s anc dl sadvantages 
fo the Bade tap 4 Transform. as Compared te tha Fouriae trasgfors.s for 

g0@@er wavs ASwlys|s, Tae "periments [n sactlon >? FOoveGe | that 
§oAlleatian of the Hedamar? trangfnen Glrectiy to speach waves ¥ldide 
ROSr Ffesulte, Oy fe farlg tO ¢atraat Importane fomtures, The ema) jer 
The murpar 9 Poatipag Nas 1Ssary to securate|y FOOrFSSSnt a speech 
waves Tht neteer | lay [lo she Fourler casa, if aimoest ali CAgO5 for 
a Vecalle sousu - so0OCh 4ave [pm pipresenteg hy the fleet three 
formant Pe@gua@rclas pad ae pilee@h (fundasantal) 'raguency, Oniy¥ aur 
Oera"eters are nyeues, ‘“sowever In the dadenned sequency Ssectrum, we 
Carnet ohberve aay Tybical femtures becwuse 5? the strong cime-nhi te 
Senaitivits e"Tls0 makes tf iMOSSsibia ty aoply even a piteh 
SyPCNFOnOuUy aging, A ecnher Words: tyolcg| Features whisk are 
recoGnizanie fr the Fourter case are averaged and are scattered away 
in a wide range of a S@quenrcy Mower Spectrum, Some of the exserinents 
in s@ction 2,2 Jenonstrate it, 


! 
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Titeesnife invariants for the Hadamard sequency spectrum are known, 
Ins of these aefines bY eq, (8) s0eg not bear enough information to 
perfOr™ @ speech wave analysis since from « Gigitized goeeen Wave 
COTEOs@c of 25H holagts, we sat ooly “OMBOnents, Although each of 
These comorents ‘as ieee rélesionship with an Sutout from a fliee, 
Bath» leg freguensy onand fe Uetariqi ned by the number of Ocints 
Transformed, wnnygoen C71 man ae! ined another complete set of the 
Hedararc tracstfors whom ‘iat O*eetly tne 2a58 number of SOTpoOnents as 
2 Fourier frequency spectrum and ols invariant uncer & clroular 
tite-ghife, a\95 of af [87] found an algoritnhy to Shleviate thege 
Terre, Howey WISP ieation oy an ireational Auno@r is |mcluded gna 
is rore CONDI icttad thar that ee toe fast Hadamard transfor, As 
OAMsorS signests Tat fre oretinent energy ‘ime oof the Fourle: 
Spectrum tence to se eSasserated, it is geslreagie 'S caletlate 
JHNsOrg’s invarlants for a Speeoh wave, 


in section 3 tha hapstrun technique is introduced, Tris teshalque 
's silmljar to the 30 called censtrun technique except that tne FHT fs 
applied to the lod=maanl tude frequency SO0Ctrum, After the 


apolication of she Fourlar transform tre Hedamard = transforn ls 
Sooilec 25 Uaeece @ Bitch o@elod or to get a smOothed seeetrun, Tris 
techelaue shows 32%@ oOs/tl¥e agnect of 78 Hadearo tranafors for 
rae aMalysls ef « soeecn vave WTTh rogard to the peduetton of the 
SrOchasing time ragulrag fs SToOocHhI Ng, Good Smoothing sakes it 
Sat, fo extrdct tne first torea formant freavenciog in y spectrum, 
S@ sFoulc nore that SUSOTF ING OY toe napstrun ip obtained at to cost 
Of attlricy in Gatermining Peak position of a SMOOTNOO Srectrum, This 
is exolaned by ag fii) oF (22) Im section 3,4, We can comelsude thee 
Pr@cls@ forsunt freguenctas are obtalneg oY the ceostrum technlaue a+ 
thé cost of HroCessing tise. wolla a raguctlon af oroctessing time fs 
Cotalneg ty tha NAgdteut toehnl quae AT the cost 5/ &éturacy jn 
Ceterniring tne formant frejuenetes, “G’aver, it is oftan true that 


426) 


to detect a peak caused by a pitch period, is difficult even in the 
Case of w male voice, The author’s orlginal optimistic standpoint was 
that the wadamard transform night reveal some new aspece of speech 
waves, However the only again found from using the Hadamarc transform 
Was the reduction of processing time required for smoothing, and this 
was Obtained at the cost of precision, 


A fOrMant tracking program using an edge follower has been described 
in section 3.4, While the algorithm is rather sopnisticated, most of 
the tlre Is stil! devoted to the smoothing and peak selection 
orecedures, 
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5 APPENDIX, 
Let us define a few of tna functions used nere, 
Funetion 54) Its defined as follows: 
Lat a binary representation of U or StU) de 
G0J) 2 & © GSnelLone2..00, 3168 
re) = SNealine2d, eee ) 
Sj ang vi € (4,2) (for 18 f Sn-1) 
vnera 
(Siok CM. CRE St 
one? = JL XO? U2 
eeceoveere eee tee eo toe (Ae1) 
eoeeeoeoeo tr @ ee ®*eeex eee 
HL os Uned XZ Uned 
Wo 3 Unet 
{Xue staras for Exsiusgive-or) 
OT 2M Array (a3) oe Ca,f) such tnat 
ce,fj = Caa,ei,..-, anol, fd, epee « eatanioneel 
me [a] = Cav. ele... seemeid 
2 Al = eee Glee s og ecanh Mette 
Frou the sefinicion of th@ Hadamard transforr 
: ; nH(9e1) H(nel) 
CAC]PS = (1/79) €e, 1 (A2) 
hi nel, eH(ARWi) 
aneee Nos oP ane = ‘gf. 
Doi Piitarias on tne segue nce domain, 
ae 
(4) It erray tfj WS Ete tee ewes tay ACK) ex AG!) for 1 = k at ims 
wnaré oc Sf w $ (\/2) = 1, and the difference of sa-suenca 
nurver between afl) and ACk) Ts one. 
rroof: 


Suppase tne $9 
apray CACj)) I 
rerresentation 


quence numoer of the K=th or 'eth eferent of 
$s 5 or te r@spectively. Then if the oinary 


of k», Il» S Of t is 


¢ = “nS ] Bre? 4,5 wh iv 

| = [nel ‘ned eee {i {4 

s = gnel srne2... si gi (A=3) 
t = tnel tne? ... tl te! 

then « = G(3) ard | = Gt) ‘see (31) (A-4) 


(23) 


Me ea 


Since @ <¢ k < (N/2) = 1 and | 2 k * (N/2) 

Most signifjcan: Dinary digit kne1 and |n-1 are 

kne-1l = ¢ 

In-l = 1 ana (A@#5) 
ki = |t for | #neqQ 


From e@q (A*4) and (Ae-5) 


Inel = ¢@ XOR ti = 121 
kn-1 = §2 XOR sl 
= sl XOR $2 


so 
te] 


ti xXOR  ¢2 


Cee ee ee ee notre t ent aneeens (A=6) 


K1 = sne2 XOR snel = tn=2 XOR tnd 
kB = gnel = ¢net 


We obtaln the fellewing relation from eq (46), 


si 2 tl fore 1s¢ 1 $§ nel, and 


1 cif st 
fo¢ctf si 


s@ 2) (A=7) 
$0 1) 


Eq (Aw7) implies that a s or t is In sequence, 
In other words the difference of sequence number between 
A(k) and a(|) js one, 


Let CACJ)] be CE,F) where 
CEFF) = CEM EL, eee Emm FOF 1s yee Pmeld (Ae8) 


@ and ¢0 
1 and ¢2 


From @q (Ae?) 


Ek = Cel(n(k)) + CfICH(K)) 


Fe = Ced(n(k)) |= CfI(hCk)) | (A=9) 
Where (h(k)) js the k=eth column of matrix H(ne\1), 


ee ae 
Since in OUr case Cf) = CO sGre 00D) 


EK = Fk, namely ACI) = ACk) for | 2 k + (Y/2) Q.€,0, 


(8) We can generalize the result of (A) further, 
Z28ro all components of array Ca(j)J] such that 
2* ¢ J $M -t where 1 $k $ ned, then 
(n-k) k 


acarsa(2®) = a¢o-c2% oy = acsecak yy = o.ce accep o1)-2* ) 
ACA 2AC2% apy enc eco paz sac3-(2K potye, .2a(¢ 2’ K) gy. oK a4) 


(29) 


Ll Ni A ELITE 


ACTISAC2K wi yeqcae(2K yop yenc3-(2% yarye,, 2acc2 SO Ky ok yyy 


AC2K ysacaeah oqyene3e(2 yoayeaca eco yogye,, aca” #1) 


anc in each group, for axample (Aci), Acak +1), 


ACS*(2® yoty,,,, acca (M~ Wnty aK oty y, 26"-K) cones iietye 
S®@quence numbers are Included, 


Proof: 


It Is apparent from the recursive definition of the 


Hadamard transform matrix given tm ea (1) and the proof 
Given ta (A), 


(38) 


C11) 


C12) 


C13) 
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